A mixture of Gaussians front end for speech recognition

نویسندگان

  • Matthew N. Stuttle
  • Mark J. F. Gales
چکیده

This paper describes a feature extraction technique based on fitting a Gaussian mixture model (GMM) to the speech spectral envelope. The features obtained (the component means, variances and priors) represent both the the general shape of the spectrum and provide information on the position of the spectral peaks. As the features select peaks in the spectrum they are related to the formant amplitudes, locations and bandwidths. Results using the Resource Management corpus, a medium vocabulary task are presented. Although by themselves the GMM features do not outperform MFCC features, systems combining the GMM systems with a standard frontend are shown to give a reduction in word error rate.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A comparative study of Gaussian selection methods in large vocabulary continuous speech recognition

Gaussian mixture models are the most popular probability density used in automatic speech recognition. During decoding, often many Gaussians are evaluated. Only a small number of Gaussians contributes significantly to probability. Several promising methods to select relevant Gaussians are known. These methods have different properties in terms of required memory, overhead and quality of selecte...

متن کامل

N-best based stochastic mapping on stereo HMM for noise robust speech recognition

In this paper we present an extension of our previously proposed feature space stereo-based stochastic mapping (SSM). As distinct from an auxiliary stereo Gaussian mixture model in the front-end in our previous work, a stereo HMM model in the back-end is used. The basic idea, as in feature space SSM, is to form a joint space of the clean and noisy features, but to train a Gaussian mixture HMM i...

متن کامل

A Comparative Study of Gauss in Large Vocabulary Continuou

Gaussian mixture models are the most popular probability density used in automatic speech recognition. During decoding, often many Gaussians are evaluated. Only a small number of Gaussians contributes significantly to probability. Several promising methods to select relevant Gaussians are known. These methods have different properties in terms of required memory, overhead and quality of selecte...

متن کامل

Reduced gaussian mixture models in a large vocabulary continuous speech recognizer

Large vocabulary continuous speech recognition (LVCSR) systems usually employ several tens of thousands of gaussian mixture components for an accurate statistical representation of naturally spoken human speech. For applications that cannot e ort the computational expensive evaluation of numerous Gaussians during recognition time, it is an important question whether the number of Gaussians can ...

متن کامل

The bucket box intersection (BBI) algorithm for fast approximative evaluation of diagonal mixture Gaussians

Today, most of the state-of-the-art speech recognizers are based on Hidden Markov modeling. Using semi-continuous or continuous density Hidden Markov Models, the computation of emission probabilities requires the evaluation of mixture Gaussian probability density functions. Since it is very expensive to evaluate all the Gaussians of the mixture density codebook, many recognizers only compute th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001